1 Requirements

library(TENxMultiomeTools)
library(AllenInstituteBrainData)
library(Signac)
library(Seurat)
library(EnsDb.Mmusculus.v79)
library(scuttle)
library(scater)
library(darioscripts)
library(HDF5Array)
library(RColorBrewer)
source("R/utils.R")

2 Introduction

The TENxMultiomeTools package is a work in progress project intended to facilitate the loading and handling of 10x Genomics multiome data in a SingleCellExperiment format.

This pipeline starts from the output generated by the 10x cellranger-ARC pipeline and is designed for differential analysis of multiple samples in different conditions. We defined it while working on a complex design of 12 different multiome samples of mouse brain cortex in 4 different condition.

Because the data are not published yet, we are using one of the publicly available dataset from the official 10x Genomics website to illustrate the available functionalities of the package and challanges faced with this type of data.

Additionally, because of lack of public datasets with different conditions, we cannot illustrate the full pipeline, but we’ll go through the implemented functionalities and we’ll illustrate possible examples on real data.

We used two replicates of the same mouse brain tissue present on the official 10x Genomics Website.

3 Loading Data

To facilitate 10x Genomics Multiome data loading we can use the read10xMultiome function, which takes the following parameters as input:

  • sample_path: the path of the experiment
  • type: the type of data to load, sparse/hdf5
  • data: data type to load, filtered/raw
  • compressed: logical indicating if the data are stored in a compressed format
  • col.names: logical indicating if to add ID to the cols of the assays.
  • addfrags: logical indicating if the path to the fragments file has to be saved in the sce. Note that this is needed when computing the metrics in the next step.
  • reference: character indicating the reference assay to use as main assay

This function loads the data in a SingleCellExperiment format, creating a main assay and an altExp assay, which is another SingleCellExperiment object. The colData of the main assay will always contain the information related to the cells of the experiment, so it will not be replicated twice, but only stored in the main one. On the other hand the information related to the rows of the assays differ because for the RNA a rowData DataFrame stores information about the genes, while for the ATAC data it will be in a rowRanges format. This format allows to store the information about the peaks in a GenomicRanges format where the genomic coordinates are in the form of Chromosome, Start and End while additional data are stored in the mcols DataFrame.

N.B. The read10xMultiome function will be deprecated in favor of the exported functionalities of the TENxIO package.

library(TENxMultiomeTools)
scelist <- lapply(c("multiome_website_mouse_old", 
                    "multiome_website_mouse_old_rep2"),    
                read10xMultiome, 
                type="HDF5", 
                addfrags=TRUE)

names(scelist) <- c("Brain_R1", "Brain_R2")
saveExpList(scelist, postfix="10xBrain")
n <- 50
palette = brewer.pal.info[brewer.pal.info$category == 'qual',]
palette = unlist(mapply(brewer.pal, palette$maxcolors, rownames(palette)))

scelist <- loadExpList(postfix="10xBrain")
## Reading RDS/scelist_10xBrain files
names(scelist) <- c("Brain_R1", "Brain_R2")
sce <- scelist[[1]]
sce
## class: SingleCellExperiment 
## dim: 32285 4878 
## metadata(2): Samples path
## assays(1): counts
## rownames(32285): ENSMUSG00000051951 ENSMUSG00000089699 ...
##   ENSMUSG00000095019 ENSMUSG00000095041
## rowData names(6): ID Symbol ... Start End
## colnames: NULL
## colData names(2): Sample Barcode
## reducedDimNames(0):
## mainExpName: RNA
## altExpNames(1): ATAC
colData(sce)
## DataFrame with 4878 rows and 2 columns
##                      Sample            Barcode
##                 <character>        <character>
## 1    multiome_website_mou.. AAACAGCCAACCGCCA-1
## 2    multiome_website_mou.. AAACAGCCAAGGTCGA-1
## 3    multiome_website_mou.. AAACAGCCAGGAACAT-1
## 4    multiome_website_mou.. AAACAGCCATATTGAC-1
## 5    multiome_website_mou.. AAACAGCCATCAGCAC-1
## ...                     ...                ...
## 4874 multiome_website_mou.. TTTGTGGCATTTGCTC-1
## 4875 multiome_website_mou.. TTTGTGTTCAATGACC-1
## 4876 multiome_website_mou.. TTTGTTGGTAGACAAA-1
## 4877 multiome_website_mou.. TTTGTTGGTGGAGCAA-1
## 4878 multiome_website_mou.. TTTGTTGGTTAGAGCC-1
rowData(sce)
## DataFrame with 32285 rows and 6 columns
##                                    ID      Symbol            Type         Chr
##                           <character> <character>     <character> <character>
## ENSMUSG00000051951 ENSMUSG00000051951        Xkr4 Gene Expression        chr1
## ENSMUSG00000089699 ENSMUSG00000089699      Gm1992 Gene Expression        chr1
## ENSMUSG00000102331 ENSMUSG00000102331     Gm19938 Gene Expression        chr1
## ENSMUSG00000102343 ENSMUSG00000102343     Gm37381 Gene Expression        chr1
## ENSMUSG00000025900 ENSMUSG00000025900         Rp1 Gene Expression        chr1
## ...                               ...         ...             ...         ...
## ENSMUSG00000095523 ENSMUSG00000095523  AC124606.1 Gene Expression  JH584299.1
## ENSMUSG00000095475 ENSMUSG00000095475  AC133095.2 Gene Expression  JH584299.1
## ENSMUSG00000094855 ENSMUSG00000094855  AC133095.1 Gene Expression  JH584299.1
## ENSMUSG00000095019 ENSMUSG00000095019  AC234645.1 Gene Expression  JH584303.1
## ENSMUSG00000095041 ENSMUSG00000095041  AC149090.1 Gene Expression  JH584304.1
##                        Start       End
##                    <numeric> <numeric>
## ENSMUSG00000051951   3671497   3671498
## ENSMUSG00000089699   3466586   3466587
## ENSMUSG00000102331   3658903   3658904
## ENSMUSG00000102343   3985983   3986215
## ENSMUSG00000025900   4360313   4409241
## ...                      ...       ...
## ENSMUSG00000095523    837363    837364
## ENSMUSG00000095475    913082    913083
## ENSMUSG00000094855    921941    921942
## ENSMUSG00000095019     81606     81607
## ENSMUSG00000095041     59666     59690
altExp(sce)
## class: SingleCellExperiment 
## dim: 172193 4878 
## metadata(2): Samples fragpath
## assays(1): counts
## rownames(172193): chr1:3060610-3061485 chr1:3094582-3095477 ...
##   JH584292.1:12506-13421 JH584295.1:1378-1976
## rowData names(3): ID Symbol Type
## colnames: NULL
## colData names(2): Sample Barcode
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
rowRanges(altExp(sce))
## GRanges object with 172193 ranges and 3 metadata columns:
##                            seqnames          ranges strand |
##                               <Rle>       <IRanges>  <Rle> |
##     chr1:3060610-3061485       chr1 3060610-3061485      * |
##     chr1:3094582-3095477       chr1 3094582-3095477      * |
##     chr1:3113326-3114208       chr1 3113326-3114208      * |
##     chr1:3119674-3120516       chr1 3119674-3120516      * |
##     chr1:3121300-3122029       chr1 3121300-3122029      * |
##                      ...        ...             ...    ... .
##   GL456216.1:41132-41904 GL456216.1     41132-41904      * |
##   GL456216.1:43950-44762 GL456216.1     43950-44762      * |
##   GL456216.1:48736-49596 GL456216.1     48736-49596      * |
##   JH584292.1:12506-13421 JH584292.1     12506-13421      * |
##     JH584295.1:1378-1976 JH584295.1       1378-1976      * |
##                                              ID                 Symbol
##                                     <character>            <character>
##     chr1:3060610-3061485   chr1:3060610-3061485   chr1:3060610-3061485
##     chr1:3094582-3095477   chr1:3094582-3095477   chr1:3094582-3095477
##     chr1:3113326-3114208   chr1:3113326-3114208   chr1:3113326-3114208
##     chr1:3119674-3120516   chr1:3119674-3120516   chr1:3119674-3120516
##     chr1:3121300-3122029   chr1:3121300-3122029   chr1:3121300-3122029
##                      ...                    ...                    ...
##   GL456216.1:41132-41904 GL456216.1:41132-41904 GL456216.1:41132-41904
##   GL456216.1:43950-44762 GL456216.1:43950-44762 GL456216.1:43950-44762
##   GL456216.1:48736-49596 GL456216.1:48736-49596 GL456216.1:48736-49596
##   JH584292.1:12506-13421 JH584292.1:12506-13421 JH584292.1:12506-13421
##     JH584295.1:1378-1976   JH584295.1:1378-1976   JH584295.1:1378-1976
##                                 Type
##                          <character>
##     chr1:3060610-3061485       Peaks
##     chr1:3094582-3095477       Peaks
##     chr1:3113326-3114208       Peaks
##     chr1:3119674-3120516       Peaks
##     chr1:3121300-3122029       Peaks
##                      ...         ...
##   GL456216.1:41132-41904       Peaks
##   GL456216.1:43950-44762       Peaks
##   GL456216.1:48736-49596       Peaks
##   JH584292.1:12506-13421       Peaks
##     JH584295.1:1378-1976       Peaks
##   -------
##   seqinfo: 26 sequences from an unspecified genome; no seqlengths

4 Create Annotation

The 10x Multiome default output comes with an annotation file for the detected genes, but this file could be very heavy to store in memory and to manipulate. A way to avoid this is to create an annotation by aid of an EnsDb package, which downloads the annotation data for an entire genome.

ensdb <- EnsDb.Mmusculus.v79
seqlevelsStyle(ensdb) <- "UCSC"
annotation <- GetGRangesFromEnsDb(ensdb = ensdb)
genome(annotation) <- "mm10"

5 Quality Control

Once that we have the data loaded and an annotation object in GenomicRanges format, we can start by computing some quality control metrics.

At the moment we defined three quality controls approaches, the Signac metrics, the scuttle::isOutlier and the doublets detection implemented in the scDblFinder package, but we’ll focus on the Signac metrics for this tutorial.

5.1 Signac Metrics

At the moment we implemented a wrapper for the Signac package (available on CRAN), which allows to compute multiple metrics both on RNA and ATAC data.

The metrics computed by Signac are defined as follows:

  • number of counts per each gene for the RNA and each feature for the ATAC
  • the number of features on the RNA
  • computing the ratio of fragments between 147 bp and 294 bp (mononucleosome) to fragments < 147 bp (nucleosome-free)
  • computing the TSS enrichment score as defined by Encode:
    • normalized score on the number of reads around (2000bp up/down) the TSS
scelist <- lapply(scelist, computeSignacMetrics, annotation)
## Computing hash
## Extracting TSS positions
## Extracting fragments at TSSs
## 
## Computing TSS enrichment score
## Computing hash
## Extracting TSS positions
## Extracting fragments at TSSs
## 
## Computing TSS enrichment score
saveExpList(scelist, postfix="10xBrain_metrics")
## RDS/scelist_10xBrain_metrics wrote on disk (for 2 elements)
## [[1]]
## [1] "RDS/scelist_10xBrain_metrics_1"
## 
## [[2]]
## [1] "RDS/scelist_10xBrain_metrics_2"

5.2 Compute filters on the previous metrics

Once we have the Signac metrics computed, we can use them to filter-out low-quality cells. To do so, we decided to compute the quantiles of the distribution of each metric.

We defined the low-quality cells as the ones that have at the same time all these checks passed: + number of counts per genes and features lower than the lowest quantile (default 5%) and higher than the highest quantile (default 95%). + number of features on the RNA lower than the lowest quantile (default 5%) and higher than the highest quantile(default 95%). + nucleosome signal lower than 2 + TSS enrichment lower that 1

We can choose this thresholds by hand with the lowQuantThr, highQuantThr, nucleosomeThr and TSSThr, parameters in the computeFilterCellsMetrics function.

Of course the default parameters can be chosen with an heuristic approach.

scelist <- loadExpList(postfix="10xBrain_metrics")
## Reading RDS/scelist_10xBrain_metrics files
scelist <- lapply(scelist, function(sce)
{
    computeFilterCellsMetrics(sce, metric="quantile",
        lowQuantThr="5%", highQuantThr="90%",
        nucleosomeThr=2, TSSThr=1)
})

5.3 Quality Control Plots

We can take a look to the distributions and the labeled cells with the plotFilteredCells function, which shows violin plots for all the metrics computed and colors the cells based on the quality control passed/not passed label.

The arguments of this function allow to plot the filtered-in/out cells with the inout parameter. Of course we are mainly interested in the filtered-out cells because we want to better understand if there is something important the we’re filtering out.

It is also possible to pass a custom column with a user-defined column name with TRUE/FALSE values indicating the kept/filtered cells, by aid of the filterCellsBy and customColName parameters.

lapply(scelist, function(sce)
{
    plotFilteredCells(sce, name="Signac Metrics", inout="out")
})
## [[1]]

## 
## [[2]]

5.4 Filtering Cells

Once we retain satisfied with the Quality Controls we can proceed by filtering the cells and to normalize the data.

We highlight that the in_signac column is automatically computed in the computeFilterCellsMetrics and it is a logical AND across all the metrics described in the previous “Signac” steps.

scelist <- lapply(scelist, function(sce) {
    sce <- sce[,sce$in_signac]
    sce <- logNormCounts(sce)
    sce <- runPCA(sce)
    sce <- runTSNE(sce, dimred="PCA")
    sce
})

6 Assign Cell Labels

To assign cell labels there are several methods available in literature, some of which requires a reference dataset which helps to map the cell labels from the reference to our dataset.

At the moment there are not so many reliable reference dataset easisly available, in particular with validated cell types and, obviously, not for any kind of tissue we could need.

6.1 Cell Types Reference

For this tutorial, we are using a reference dataset from the Allen Institute, available from the AllenInstituteBrainData package https://github.com/drighelli/AllenInstituteBrainData which allows to download an annotated dataset of ~1 Million mouse brain cells annotated at three different levels of taxonomy.

This reference has three levels of annotation graularity:

  • the cluster level is the level where the cells have been clustered in
  • the subclass level is the level where the cells have been validated thank to cell-type markers
  • the class level is the level where the cells have been grouped together because of cell-types belonging to the same cell family.

After downloading the dataset, it could be useful to create pseudo-bulk counts to speed up our computations. Using a reference of 1 Million cells could take very long time during the process of cell type annotation, and from our tests using the pseudo-bulk counts is a pretty good approximation in terms of the obtained results.

allen <- AllenInstituteBrainData("Allen_Mouse_2020")
allen <- aggregateAcrossCells(allen, use.assay.type = "counts",id=DataFrame(label=allen$subclass_label))
allen <- logNormCounts(allen)
saveRDS(allen, "RDS/Allen_pseudo.RDS")

We just load the already computed reference for lack of time.

allen <- readRDS("RDS/Allen_pseudo.RDS")

6.2 Assigning labels to the cells

At this point we can simply assign the labels from the reference to our dataset. At the moment, we created a wrapper assignLabels around the SingleR package, but, as already mentioned, there is plenty of methods that can be used for this scope.

Indeed, further versions of this function will implement other methods, i.e. Azimuth from Rahul Satija lab.

scelist <- lapply(scelist, function(sce) { 
    assignLabels(sce, allen, "subclass_label")
})
saveExpList(scelist, postfix="10xBrain_lblSR")
## RDS/scelist_10xBrain_lblSR wrote on disk (for 2 elements)
## [[1]]
## [1] "RDS/scelist_10xBrain_lblSR_1"
## 
## [[2]]
## [1] "RDS/scelist_10xBrain_lblSR_2"

6.3 Visualizing cell types

Once we have the cell types labeled for our experiment, we are interested in understanding their clustering and this can be easily done with the plotTSNE function of the scater package.

scelist <- loadExpList(postfix="10xBrain_lblSR")
## Reading RDS/scelist_10xBrain_lblSR files
lapply(scelist, function(sce){table(sce$SingleR)})
## [[1]]
## 
##         Astro      CA1-ProS           CA3          Car3            CR 
##           309            45             7             5          1849 
##        CT SUB            DG          Endo         IG-FC   L2  IT ENTl 
##            22             2             7           258             4 
##    L2 IT ENTm   L2/3 IT CTX  L2/3 IT ENTl   L2/3 IT PPP   L2/3 IT RHP 
##             1            21            10             7            40 
##   L4/5 IT CTX L5 IT TPE-ENT     L5 NP CTX        L5 PPP     L5 PT CTX 
##             1             2             9             3            51 
##     L6 CT CTX    L6 IT ENTl       L6b CTX    L6b/CT ENT         Lamp5 
##             6             6            39            15            51 
##         Meis2     Micro-PVM        NP PPP        NP SUB         Oligo 
##           136            11            29            30           183 
##         Pvalb      SMC-Peri          Sncg           Sst     Sst Chodl 
##            15             6            70           150            35 
##      SUB-ProS           Vip          VLMC 
##             2           135             5 
## 
## [[2]]
## 
##         Astro      CA1-ProS           CA3          Car3            CR 
##           308            45             7             5          1848 
##        CT SUB            DG          Endo         IG-FC   L2  IT ENTl 
##            22             2             6           258             4 
##    L2 IT ENTm   L2/3 IT CTX  L2/3 IT ENTl   L2/3 IT PPP   L2/3 IT RHP 
##             1            21            10             8            40 
##   L4/5 IT CTX L5 IT TPE-ENT     L5 NP CTX        L5 PPP     L5 PT CTX 
##             1             2             9             3            51 
##     L6 CT CTX    L6 IT ENTl       L6b CTX    L6b/CT ENT         Lamp5 
##             6             6            39            15            51 
##         Meis2     Micro-PVM        NP PPP        NP SUB         Oligo 
##           135            11            29            30           180 
##         Pvalb      SMC-Peri          Sncg           Sst     Sst Chodl 
##            15             6            69           150            34 
##      SUB-ProS           Vip          VLMC 
##             2           135             5
ggp <- lapply(scelist, plotTSNE, colour_by="SingleR") 
ggp[[1]] + scale_color_manual(values=palette)
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.

ggp[[2]] + scale_color_manual(values=palette)
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.

7 Doublets

Another Quality Control on the cells can be done by computing doublets scores. We implemented a wrapper on the scDblFinder package, because it offers multiple methods for doublets detection. In particular, the doublet score for each cell is based on the density of simulated doublets around it.

Even if the results provided by this package are still under verification, we suggest to use it to better understand the quality of the cells in the experiment.

Actually, we implemented the method for the scRNAseq data, but further implementations will take into account methods for scATACseq data and further investigations need to be done on how to apply these methods on both assays.

scelist <- lapply(scelist, logNormCounts)
scelist <- lapply(scelist, computeDoublets, method="griffiths")
## Warning in sweep(sim.pcs, 2L, mean.correction, FUN = "-", check.margin = FALSE): 'check.margin' is ignored when 'x' is a DelayedArray object or
##   derivative

## Warning in sweep(sim.pcs, 2L, mean.correction, FUN = "-", check.margin = FALSE): 'check.margin' is ignored when 'x' is a DelayedArray object or
##   derivative
saveExpList(scelist, postfix="10xBrain_doublets")
## RDS/scelist_10xBrain_doublets wrote on disk (for 2 elements)
## [[1]]
## [1] "RDS/scelist_10xBrain_doublets_1"
## 
## [[2]]
## [1] "RDS/scelist_10xBrain_doublets_2"

It is possible to visualize these results with a simple TSNE plot, but better investigations can be done when cell types labels will be assigned to our cell and looking where the doublets fall in our dataset.

scelist <- loadExpList(postfix="10xBrain_doublets")
## Reading RDS/scelist_10xBrain_doublets files
scelist <- lapply(scelist, runPCA)
scelist <- lapply(scelist, runTSNE, dimred="PCA")
lapply(scelist, plotTSNE, colour_by="dbl.scores")
## [[1]]

## 
## [[2]]

lapply(scelist, plotTSNE, colour_by="dbl.calls")
## [[1]]

## 
## [[2]]

We can use this information to remove doublets from the dataset.

scelist <- lapply(scelist, function(sce){sce[,sce$dbl.calls=="singlet"]})

For teaching purposes, we now subset the celltypes to the ones with the higher amount of cells.

celltypes <- c("Astro", "Oligo")
scelist <- lapply(scelist, function(sce){sce[,sce$SingleR %in% celltypes]})

8 Differential Analysis

We selected two main approaches for doing differential analysis of these data, by applying muscat (working on single cell data) and by using edgeR (working on bulk data).

Because the second one produced better results on our datasets, we are going to illustrate the steps that can be performed to apply edgeR on single cell data to obtain differentially expressed genes (RNA) and differential accessible regions (ATAC).

To apply bulk analysis methods on single cell data, these need to be transformed in pseudo-bulk counts, and to do so there are multiple methods able to do that.

NB: We are not pointing to not use muscat (or other methods), but to always compare the results produced by multiple approaches to better investigate your data.

8.1 Pseudobulk

A possible approach to obtain pseudo-bulk data from your single-cell is to use the aggregateAcrossCells method from the scuttle package (as previously done for the reference dataset).

To apply this method we simply have to define a named vector of the cell types to use for the aggregation of the cells. In our case, we saved this information in the colData of our SingleCellExperiment under the SingleR column.

This approach is pretty straightforward when working with a scRNAseq because the genes are well defined even when working with multiple experiments.

In the end, we will have a new SingleCellExperiment with a reduced number of columns, which, in turn, will be of the same number of cell types defined in our SingleR colData column.

scelist <- lapply(scelist, swapAltExp, name="RNA")
scelistps <- lapply(scelist, function(sce){
    applySCE(sce, aggregateAcrossCells, use.assay.type="counts",
        id=DataFrame(label=sce$SingleR))
    })

saveRDS(scelistps, "RDS/scelistps_brain.RDS")

At this point it is possible to apply and visualize the data like bulk data.

scelistps <- readRDS("RDS/scelistps_brain.RDS")

lapply(scelistps, function(sce)
{
    plotPCAs(counts(sce), colData=colData(sce), shapeBy="SingleR",
         colorBy="SingleR", returnOnly=TRUE)
})
## $Brain_R1

## 
## $Brain_R2

9 ATAC Analysis

The same operation for pseudo bulk can be made on a single ATAC experiment obtaining the same number of columns as for the RNA.

scelistps <- lapply(scelistps, swapAltExp, name="ATAC")
head(counts(scelistps[[1]]))
##                      Astro  CR Oligo
## chr1:3060610-3061485     1  37     0
## chr1:3094582-3095477    41 235    29
## chr1:3113326-3114208     2  81     0
## chr1:3119674-3120516    11 266     0
## chr1:3121300-3122029     6  59     4
## chr1:3180877-3181781     2  27     0
rowRanges(scelistps[[1]])
## GRanges object with 172193 ranges and 3 metadata columns:
##                            seqnames          ranges strand |
##                               <Rle>       <IRanges>  <Rle> |
##     chr1:3060610-3061485       chr1 3060610-3061485      * |
##     chr1:3094582-3095477       chr1 3094582-3095477      * |
##     chr1:3113326-3114208       chr1 3113326-3114208      * |
##     chr1:3119674-3120516       chr1 3119674-3120516      * |
##     chr1:3121300-3122029       chr1 3121300-3122029      * |
##                      ...        ...             ...    ... .
##   GL456216.1:41132-41904 GL456216.1     41132-41904      * |
##   GL456216.1:43950-44762 GL456216.1     43950-44762      * |
##   GL456216.1:48736-49596 GL456216.1     48736-49596      * |
##   JH584292.1:12506-13421 JH584292.1     12506-13421      * |
##     JH584295.1:1378-1976 JH584295.1       1378-1976      * |
##                                              ID                 Symbol
##                                     <character>            <character>
##     chr1:3060610-3061485   chr1:3060610-3061485   chr1:3060610-3061485
##     chr1:3094582-3095477   chr1:3094582-3095477   chr1:3094582-3095477
##     chr1:3113326-3114208   chr1:3113326-3114208   chr1:3113326-3114208
##     chr1:3119674-3120516   chr1:3119674-3120516   chr1:3119674-3120516
##     chr1:3121300-3122029   chr1:3121300-3122029   chr1:3121300-3122029
##                      ...                    ...                    ...
##   GL456216.1:41132-41904 GL456216.1:41132-41904 GL456216.1:41132-41904
##   GL456216.1:43950-44762 GL456216.1:43950-44762 GL456216.1:43950-44762
##   GL456216.1:48736-49596 GL456216.1:48736-49596 GL456216.1:48736-49596
##   JH584292.1:12506-13421 JH584292.1:12506-13421 JH584292.1:12506-13421
##     JH584295.1:1378-1976   JH584295.1:1378-1976   JH584295.1:1378-1976
##                                 Type
##                          <character>
##     chr1:3060610-3061485       Peaks
##     chr1:3094582-3095477       Peaks
##     chr1:3113326-3114208       Peaks
##     chr1:3119674-3120516       Peaks
##     chr1:3121300-3122029       Peaks
##                      ...         ...
##   GL456216.1:41132-41904       Peaks
##   GL456216.1:43950-44762       Peaks
##   GL456216.1:48736-49596       Peaks
##   JH584292.1:12506-13421       Peaks
##     JH584295.1:1378-1976       Peaks
##   -------
##   seqinfo: 26 sequences from an unspecified genome; no seqlengths

10 ATAC Replicates Consensus Peaks

But things become a little bit more complicated when having multiple ATAC experiments, coming from different conditions, it could be problematic to compute a pseudo-bulk counts, because of the differences across the peaks (each experiment has different number of peaks and coordinates).

We can use the DEScan2 already implemented functionalities for bulk data, but with some precautions.

11 Creating BAM files per cell types

To properly construct a new count matrix with DEScan2 for our consensus peaks, we need the BAM files organized per cell types.

We can use the 10xMultiomeTools::createBamCt function to do this operation.

This function requires a few external software to operate properly: + sinto: a python software to create BAM files per each cell type starting from a list of barcodes (the ones of the specified cell type) + samtools: useful to sort and index the new BAM files

Parameters Details:

  • sce: An SCE object
  • sampleName: the name of the sample
  • cellType: the name of the cell type to process
  • cellTypesCol: the name of che colData colname where to get the cell types
  • sort: it sorts the bam file with samtools
  • bamdir: the directory where to get the input bams, if you used the read10xMultiome function this information is stored in the metadata(sce)$path
  • bamType: one of ATAC, GEX, both to process one of the assays or both
  • outdir: the new bam output directory (automatically created)
  • ncores: the number of cores to use

NB This is not needed if you already have the BAM files organized by cell types.

scelist <- lapply(scelist, function(sce){
    path <- lapply(unique(sce$SingleR), function(ct){
        createBamCt(sce, cellType=ct, 
            cellTypesCol="SingleR", sort=TRUE,
            bamdir=metadata(sce)$path, bamType="ATAC", 
            outdir="bams_allct", ncores=10)
    })
    metadata(sce)$ct_bams <- unique(unlist(path))
    sce
})

12 DEScan2 package

The DEScan2 package was developed to work with genomic bulk data, it implements three major steps:

  • Peak Calling: findPeaks a peak caller for identify the regions present in the BAM/BED files of the experiment (not suggested for single cell data)
  • consensus peaks: finalRegions given a list of GenomicRanges (GR) identifies the common peaks across the given list and returns a new (GR) with the consensus list of the found peaks.
  • count matrix construction: countFinalRegions starting from a GR and a directory with BAM files, it constructs a new matrix of counts with the peaks on the rows and the bamfiles (samples) on the column.

12.1 A few details

Before working with these functions we first provide a few details to better understand how to setup them:

  • finalRegions: ++ peakSamplesGRangesList: a GenomicRangesList, each element of the list is a GR, associated to a sample. ++ zThreshold: a numeric value allowing to remove all the peaks with a score less then it. We set this to 0 if we want to keep all the peaks. ++ minCarriers: a numeric value allowing to keep all the peaks present at least in minCarriers samples. ++ scorecolname: character indicating the name of the column where the peaks score are stored. ++ Returns: a GR with the consensus peaks and some additional info +++ k-carriers: number of samples where the peak has been found (an interval between [minCarriers, length(peakSamplesGRangesList)]) +++ n-peaks: the total number of peaks where the new consensus peak is originating from

  • countFinalRegions: ++ regionsGRanges: a GR, ideally the GR of our consensus peak list, but it can be any GR. ++ readsFilePath: the filepath of BAM or BED files to compute the coverage of the peaks. ++ fileType: one of BAM or BED ++ minCarriers: same as for finalRegions, the minimum number of samples where the peak needs to be present, otherwise it will be discarded. ++ Returns: a SummarizedExperiment with the peaks on the rowRanges and a new matrix of counts on the main assay.

12.2 Working with single cell ATAC data

DEScan2 finalRegions needs a score column for working properly, but when working with 10x default peaks this score is not provided.

A possible work-around can be done by using the counts/logcounts assay for the ATAC data.

By aid of the 10xMultiome buildATACScores we are able to create a new score column in the rowRanges of the SCE and use this column in the DEScan2 finalRegions function.

The score is defined as the sum of the counts of all the Barcodes assigned to a specific cell type.

To compute this for all the cell types we include these operations in a loop over all the celltypes present in the SingleCellExperiments.

library(DEScan2)

grlct <- lapply(celltypes, function(ct)
{
    grl <- lapply(scelist, function(sce) {
        message("computing atac scores")
        # this function helps to create a score for the CT
        sce <- buildATACScores(sce, cellType=ct)
        rowRanges(altExp(sce))
    }) 
    message("Computing final regions")
    # we now work only with the rowRanges of both the experiments
    gr <- finalRegions(GRangesList(grl), 
        zThreshold=0, minCarriers=1, saveFlag=FALSE, 
        scorecolname="score", verbose=TRUE,
        BPPARAM=BiocParallel::MulticoreParam(workers=1))
    
    ## moving bam in a dedicated folder per each cell type
    bams <- list.files("bams_allct",  pattern="*sorted.bam*", full.names=TRUE)
    bams <- bams[grep(ct, bams)] # just to be sure to take only the ones of the ct
    ctpath <- file.path("bams_allct", ct)
    if(!dir.exists(ctpath)) dir.create(ctpath, showWarnings=FALSE)
    file.rename(bams, file.path(ctpath, basename(bams)))
    metadata(gr)$bam_path <- ctpath
    gr
})
names(grlct) <- celltypes

saveRDS(grlct, file="RDS/grlct_celltypes_brain.RDS")

We inspect the number of consensus peaks list present in each celltype to compare this information across all the celltypes. Obviously, in this case the starting samples are the same for all the celltypes so we obtain the same number of peaks. This changes when we work with samples of different conditions.

The main aim of this plot is to understand how many peaks we have across the samples present in our experiment.

Typically the number of peaks decreases with the increasing of the samples, allowing us to suddenly understand how many peaks we want to work with
during the further analysis.

## Create plot with number of peaks and "carriers" (Celltypes in this case
grlct <- readRDS("RDS/grlct_celltypes_brain.RDS")
plotPeaksSamplesDEScan2(grlct)

12.3 Computing Count Matrix

Once we have the consensus peak list, we can finally construct the new count matrix with countFinalRegions function, passing as input the GR of the peaks and the directory of BAM files, producing as output a SummarizedExperiment.

grlctSE <- lapply(seq_along(grlct), function(i)
{
    gr <- grlct[[i]]
    ct <- names(grlct)[i]
    
    message("Computing count final regions for ", ct)
    grSE <- countFinalRegions(gr, readsFilePath=metadata(gr)$bam_path,
                         fileType="bam", minCarriers=1)
    colnames(grSE) <- basename(colnames(grSE))
    grSE
})
names(grlctSE) <- names(grlct)
saveRDS(grlctSE, file="RDS/grlctSE_brain.RDS")

Just a brief look at the results

grlctSE <- readRDS("RDS/grlctSE_brain.RDS")
head(assay(grlctSE[[1]]))
##                    multiome_website_mouse_old_rep2.h5_Astro_sorted.bam
## chr1.p00001_np1_k1                                                   4
## chr1.p00002_np1_k1                                                  37
## chr1.p00003_np1_k1                                                   8
## chr1.p00004_np1_k1                                                  24
## chr1.p00005_np1_k1                                                  75
## chr1.p00006_np1_k1                                                  10
##                    multiome_website_mouse_old.h5_Astro_sorted.bam
## chr1.p00001_np1_k1                                              4
## chr1.p00002_np1_k1                                             37
## chr1.p00003_np1_k1                                              8
## chr1.p00004_np1_k1                                             28
## chr1.p00005_np1_k1                                             78
## chr1.p00006_np1_k1                                             10
rowRanges(grlctSE[[1]])
## GRanges object with 174565 ranges and 6 metadata columns:
##                           seqnames            ranges strand |
##                              <Rle>         <IRanges>  <Rle> |
##      chr1.p00001_np1_k1       chr1   6757636-6758045      * |
##      chr1.p00002_np1_k1       chr1   9619486-9620003      * |
##      chr1.p00003_np1_k1       chr1 10836987-10837736      * |
##      chr1.p00004_np1_k1       chr1 13134151-13134903      * |
##      chr1.p00005_np1_k1       chr1 13152492-13153338      * |
##                     ...        ...               ...    ... .
##   GL456216.1.p11_np3_k2 GL456216.1       40230-42104      * |
##   GL456216.1.p12_np2_k2 GL456216.1       43750-44962      * |
##   GL456216.1.p13_np2_k2 GL456216.1       48536-49796      * |
##    JH584292.1.p1_np2_k2 JH584292.1       12306-15068      * |
##    JH584295.1.p1_np2_k2 JH584295.1            0-2176      * |
##                                             ID                 Symbol
##                                    <character>            <character>
##      chr1.p00001_np1_k1   chr1:6757836-6757845   chr1:6757836-6757845
##      chr1.p00002_np1_k1   chr1:9619686-9619803   chr1:9619686-9619803
##      chr1.p00003_np1_k1 chr1:10837187-10837536 chr1:10837187-10837536
##      chr1.p00004_np1_k1 chr1:13134351-13134703 chr1:13134351-13134703
##      chr1.p00005_np1_k1 chr1:13152692-13153138 chr1:13152692-13153138
##                     ...                    ...                    ...
##   GL456216.1.p11_np3_k2                   <NA>                   <NA>
##   GL456216.1.p12_np2_k2                   <NA>                   <NA>
##   GL456216.1.p13_np2_k2                   <NA>                   <NA>
##    JH584292.1.p1_np2_k2                   <NA>                   <NA>
##    JH584295.1.p1_np2_k2                   <NA>                   <NA>
##                                Type     score   n-peaks k-carriers
##                         <character> <numeric> <numeric>  <numeric>
##      chr1.p00001_np1_k1       Peaks         0         1          1
##      chr1.p00002_np1_k1       Peaks        10         1          1
##      chr1.p00003_np1_k1       Peaks         4         1          1
##      chr1.p00004_np1_k1       Peaks        11         1          1
##      chr1.p00005_np1_k1       Peaks        28         1          1
##                     ...         ...       ...       ...        ...
##   GL456216.1.p11_np3_k2        <NA>   18.3333         3          2
##   GL456216.1.p12_np2_k2        <NA>   14.5000         2          2
##   GL456216.1.p13_np2_k2        <NA>    4.5000         2          2
##    JH584292.1.p1_np2_k2        <NA>   12.0000         2          2
##    JH584295.1.p1_np2_k2        <NA>  104.5000         2          2
##   -------
##   seqinfo: 26 sequences from an unspecified genome; no seqlengths

13 Differential Accessible Regions Analysis

Because the previous dataset was only on a single condition, with two replicates that are pratically the same sample, it was impossible to find Differential Accessible Regions.

To help investigate this aspect we are going to use another dataset from the 10x Genomics official website, with multiple already aggregated samples.

sce1 <- loadHDF5SummarizedExperiment("sce_Alzheimer_Ann_Tsn")
plotTSNE(sce1, colour_by="SingleR") + theme_bw() + scale_color_manual(values = palette)
## Scale for colour is already present.
## Adding another scale for colour, which will replace the existing scale.

plotTSNE(sce1, colour_by="condition") + theme_bw() 

library(darioscripts)
sce57aATAC <- readRDS("RDS/sce_alz_57_ps_ATAC.RDS")
sce57aATAC$replicate <- gsub("rep6", "rep3", sce57aATAC$replicate)
sce57aATAC$cnames <- paste0(sce57aATAC$SingleR, "_", sce57aATAC$genotype, "_" , sce57aATAC$replicate)
colnames(sce57aATAC) <- sce57aATAC$cnames

plotPCAs(counts(sce57aATAC),colData=colData(sce57aATAC), shapeBy="genotype", colorBy="SingleR", returnOnly=TRUE)

sce57aATACct <- sce57aATAC[,sce57aATAC$SingleR=="Astro"]
plotPCAs(counts(sce57aATACct), colData=colData(sce57aATACct), shapeBy="genotype", colorBy="replicate", returnOnly=TRUE)

13.1 Normalization

We can try to normalize data to do a better comparison across the conditions, removing biases and improving the quality of the further differential analysis.

library(RUVSeq)
## Loading required package: EDASeq
## Loading required package: ShortRead
## Loading required package: BiocParallel
## Loading required package: Biostrings
## Loading required package: XVector
## 
## Attaching package: 'Biostrings'
## The following object is masked from 'package:base':
## 
##     strsplit
## Loading required package: Rsamtools
## Loading required package: GenomicAlignments
## 
## Attaching package: 'EDASeq'
## The following object is masked from 'package:scater':
## 
##     plotRLE
## Loading required package: edgeR
## Loading required package: limma
## 
## Attaching package: 'limma'
## The following object is masked from 'package:scater':
## 
##     plotMDS
## The following object is masked from 'package:BiocGenerics':
## 
##     plotMA
## 
## Attaching package: 'edgeR'
## The following object is masked from 'package:SingleCellExperiment':
## 
##     cpm
gr <- makeGroups(sce57aATACct$genotype)
nn <- RUVs(counts(sce57aATACct), cIdx=rownames(sce57aATACct), k=1, scIdx=gr)
plotPCAs(nn$normalizedCounts, colData=colData(sce57aATACct), shapeBy="genotype", colorBy="replicate", returnOnly=TRUE)  

sce57aATACct$RUVs_W <- nn$W

13.2 Differential Analysis

We can now do the Differential Analysis by using edgeR, but as we can notice, there are no DARs between these two time-points.

DARsdf <- applyEdgeR(counts(sce57aATACct), colData=colData(sce57aATACct), 
    factors="genotype", wnames="RUVs_W", contrasts="AD - WT", p.threshold=1)[[1]]
head(DARsdf)
##                             AD    WT     logFC    logCPM         F      PValue
## chr1:24612324-24613211   793.0 306.5  2.078198 10.620801 130.74191 0.000036715
## chr17:28522520-28523450   34.0  15.5  2.042990  6.142520  32.39621 0.001482914
## chr2:104816515-104817362   6.5  44.5 -2.024763  5.960796  25.52238 0.002434473
## chr2:22589318-22590202    13.0   0.0  7.100805  4.578081  31.53548 0.002549794
## chr1:75925511-75926384    11.5   1.0  5.021054  4.508686  23.13978 0.003360042
## chr1:173367295-173368146  13.0   8.5  4.437120  5.035693  22.07011 0.003399679
##                          FDR
## chr1:24612324-24613211     1
## chr17:28522520-28523450    1
## chr2:104816515-104817362   1
## chr2:22589318-22590202     1
## chr1:75925511-75926384     1
## chr1:173367295-173368146   1

13.3 Another exaple

At this point we are interested in looking at the differences between celltypes but at different time-points.

By comparing time-point 5.7 for the WT genotype with the 17.9 for the AD genotype.

As we can notice, there are very few DARs emerging, even when we are able to explain a very high percentage of variance with the first Principal Component. This highlights the very low number of DARs with this data.

library(darioscripts)
sce57179aATAC <- readRDS("RDS/sce_alz_57_179_ps_ATAC.RDS")
sce57179aATAC$replicate <- gsub("rep4", "rep2", sce57179aATAC$replicate)
sce57179aATAC$replicate <- gsub("rep5", "rep3", sce57179aATAC$replicate)
sce57179aATAC$cnames <- paste0(sce57179aATAC$SingleR, "_", sce57179aATAC$genotype, "_" , sce57179aATAC$replicate)
colnames(sce57179aATAC) <- sce57179aATAC$cnames

plotPCAs(counts(sce57179aATAC),colData=colData(sce57179aATAC), shapeBy="genotype", colorBy="SingleR", returnOnly=TRUE)+ scale_color_manual(values = palette)

sce57179aATACct <- sce57179aATAC[,sce57179aATAC$SingleR=="Oligo"]
plotPCAs(counts(sce57179aATACct),colData=colData(sce57179aATACct), shapeBy="genotype", colorBy="replicate", returnOnly=TRUE)

library(RUVSeq)
gr <- makeGroups(sce57179aATACct$gcondition)
nn <- RUVs(counts(sce57179aATACct), cIdx=rownames(sce57179aATACct), k=1, scIdx=gr)
# plotPCAs(nn$normalizedCounts, colData=colData(sce57179aATACct), shapeBy="genotype", colorBy="replicate")  
sce57179aATACct$RUVs_W <- nn$W
DARsdf <- applyEdgeR(counts(sce57179aATACct), colData=colData(sce57179aATACct), 
    factors="genotype", wnames="RUVs_W", contrasts="AD - WT", p.threshold=1)[[1]]
DARsdf[DARsdf$FDR<0.05,]
##                             AD    WT      logFC   logCPM        F       PValue
## chr8:71719064-71719748    54.5   5.0  3.2906960 5.316290 50.46845 1.222990e-12
## chr6:47650571-47651505    24.5  87.0 -1.9533120 6.228171 49.10734 2.446125e-12
## chr1:191416480-191417285  46.5   4.0  3.5212992 5.096328 44.58244 2.457672e-11
## chr9:106552597-106553526  32.0   0.5  5.6446653 4.517222 42.64096 6.623653e-11
## JH584304.1:59268-60131   414.5 199.5  0.9290956 8.607039 41.34139 1.286912e-10
## chrX:123103090-123103991  30.0   0.5  6.0266143 4.431231 40.34334 2.143934e-10
## chr2:98664670-98665464    74.0 149.0 -1.1357649 7.201682 32.57150 1.153709e-08
## chr1:33814009-33814885    33.0  85.0 -1.4755935 6.295939 31.11818 2.437224e-08
## chr17:34743618-34744457   43.0   7.5  2.3674853 5.100544 26.43728 2.729996e-07
## chr2:154603311-154604225 135.0  58.0  1.0929834 6.958905 26.30558 2.922562e-07
## chr6:87482434-87483293     8.0  38.0 -2.3623815 5.011214 25.51453 4.402160e-07
## chr2:131909466-131910353 101.5  41.5  1.1535168 6.538908 22.89238 1.716943e-06
## chr5:136224386-136225246  28.5   3.0  3.1028647 4.476778 22.74884 1.850036e-06
##                                   FDR
## chr8:71719064-71719748   8.183514e-08
## chr6:47650571-47651505   8.184001e-08
## chr1:191416480-191417285 5.481756e-07
## chr9:106552597-106553526 1.108038e-06
## JH584304.1:59268-60131   1.722248e-06
## chrX:123103090-123103991 2.390987e-06
## chr2:98664670-98665464   1.102847e-04
## chr1:33814009-33814885   2.038555e-04
## chr17:34743618-34744457  1.955603e-03
## chr2:154603311-154604225 1.955603e-03
## chr6:87482434-87483293   2.677874e-03
## chr2:131909466-131910353 9.522563e-03
## chr5:136224386-136225246 9.522563e-03
DARsdf$gene <- rownames(DARsdf)

PlotVolcanoPlot(DARsdf, as.data.frame(counts(sce57179aATACct)), 
                design.matrix=colData(sce57179aATACct), show.plot.flag=TRUE, threshold=0.1)
## Loading required package: plotly
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:XVector':
## 
##     slice
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following objects are masked from 'package:ensembldb':
## 
##     filter, select
## The following object is masked from 'package:AnnotationDbi':
## 
##     select
## The following object is masked from 'package:IRanges':
## 
##     slice
## The following object is masked from 'package:S4Vectors':
## 
##     rename
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
## Warning in geom_point(aes(x = log2FoldChange, y = minuslog10pval, color =
## significance, : Ignoring unknown aesthetics: padj and name

14 MACS2 peaks

If we are not satisfied with cellranger-ARC peaks (emerged also with the number of detected DARs), as many authors suggests, we can take advantage of the MACS2 peak caller.

MACS2 is a widely used peak caller for bulk genomics data, it is a shell command and it has functionalities for narrow and broad peaks.

Also in this case, because we want to construct pseudo-bulk like data for our ATAC assays, we need to split the original cellranger BAM files in multiple BAM files, one per each cell type. (see section Creating BAM files per cell types)

We created an ad-hoc wrapper for working with MACS2 and previously constructed BAM files taking a SingleCellExperiment as input.

wrapMacs2(sce, outdir="peaks_dir", genome="mm", extsize=200,
        broadCall="--broad", bampath=metadata(sce)$ct_bams)

15 Session Info

sessionInfo()
## R version 4.2.0 (2022-04-22)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] plotly_4.10.1                  RUVSeq_1.30.0                 
##  [3] edgeR_3.38.4                   limma_3.52.4                  
##  [5] EDASeq_2.30.0                  ShortRead_1.54.0              
##  [7] GenomicAlignments_1.32.1       Rsamtools_2.12.0              
##  [9] Biostrings_2.64.1              XVector_0.36.0                
## [11] BiocParallel_1.30.4            RColorBrewer_1.1-3            
## [13] HDF5Array_1.24.2               rhdf5_2.40.0                  
## [15] DelayedArray_0.22.0            Matrix_1.5-3                  
## [17] darioscripts_0.1.0             scater_1.24.0                 
## [19] ggplot2_3.4.0                  scuttle_1.6.3                 
## [21] EnsDb.Mmusculus.v79_2.99.0     ensembldb_2.20.2              
## [23] AnnotationFilter_1.20.0        GenomicFeatures_1.48.4        
## [25] AnnotationDbi_1.58.0           SeuratObject_4.1.3            
## [27] Seurat_4.3.0                   Signac_1.9.0                  
## [29] AllenInstituteBrainData_0.99.0 TENxMultiomeTools_0.1.3       
## [31] SingleCellExperiment_1.18.1    SummarizedExperiment_1.26.1   
## [33] Biobase_2.56.0                 GenomicRanges_1.48.0          
## [35] GenomeInfoDb_1.32.4            IRanges_2.30.1                
## [37] S4Vectors_0.34.0               BiocGenerics_0.42.0           
## [39] MatrixGenerics_1.8.1           matrixStats_0.63.0            
## [41] BiocStyle_2.24.0               rmarkdown_2.20                
## 
## loaded via a namespace (and not attached):
##   [1] rsvd_1.0.5                Hmisc_4.7-2              
##   [3] ica_1.0-3                 zinbwave_1.18.0          
##   [5] RcppRoll_0.3.0            foreach_1.5.2            
##   [7] lmtest_0.9-40             crayon_1.5.2             
##   [9] MASS_7.3-57               rhdf5filters_1.8.0       
##  [11] nlme_3.1-157              backports_1.4.1          
##  [13] rlang_1.0.6               ROCR_1.0-11              
##  [15] irlba_2.3.5.1             ca_0.71.1                
##  [17] phylobase_0.8.10          filelock_1.0.2           
##  [19] xgboost_1.7.3.1           rjson_0.2.21             
##  [21] bit64_4.0.5               glue_1.6.2               
##  [23] pheatmap_1.0.12           scDblFinder_1.10.0       
##  [25] rngtools_1.5.2            sctransform_0.3.5        
##  [27] parallel_4.2.0            vipor_0.4.5              
##  [29] spatstat.sparse_3.0-0     spatstat.geom_3.0-5      
##  [31] tidyselect_1.2.0          fitdistrplus_1.1-8       
##  [33] XML_3.99-0.13             tidyr_1.3.0              
##  [35] zoo_1.8-11                ggpubr_0.5.0             
##  [37] org.Mm.eg.db_3.15.0       xtable_1.8-4             
##  [39] magrittr_2.0.3            evaluate_0.20            
##  [41] cli_3.6.0                 zlibbioc_1.42.0          
##  [43] hwriter_1.3.2.1           rstudioapi_0.14          
##  [45] miniUI_0.1.1.1            sp_1.6-0                 
##  [47] rpart_4.1.16              bslib_0.4.2              
##  [49] fastmatch_1.1-3           locfdr_1.1-8             
##  [51] shiny_1.7.4               BiocSingular_1.12.0      
##  [53] xfun_0.36                 cluster_2.1.3            
##  [55] TSP_1.2-4                 KEGGREST_1.36.3          
##  [57] tibble_3.1.8              ggrepel_0.9.2            
##  [59] biovizBase_1.44.0         ape_5.6-2                
##  [61] dendextend_1.16.0         listenv_0.9.0            
##  [63] png_0.1-8                 future_1.30.0            
##  [65] withr_2.5.0               bitops_1.0-7             
##  [67] plyr_1.8.8                dqrng_0.3.0              
##  [69] pillar_1.8.1              cachem_1.0.6             
##  [71] kernlab_0.9-31            DelayedMatrixStats_1.18.2
##  [73] vctrs_0.5.2               ellipsis_0.3.2           
##  [75] generics_0.1.3            NMF_0.26                 
##  [77] tools_4.2.0               foreign_0.8-82           
##  [79] rncl_0.8.7                NOISeq_2.40.0            
##  [81] beeswarm_0.4.0            munsell_0.5.0            
##  [83] fastmap_1.1.0             compiler_4.2.0           
##  [85] abind_1.4-5               httpuv_1.6.8             
##  [87] rtracklayer_1.56.1        GenomeInfoDbData_1.2.8   
##  [89] gridExtra_2.3             lattice_0.20-45          
##  [91] deldir_1.0-6              utf8_1.2.2               
##  [93] later_1.3.0               dplyr_1.0.10             
##  [95] BiocFileCache_2.4.0       jsonlite_1.8.4           
##  [97] scales_1.2.1              ScaledMatrix_1.4.1       
##  [99] pbapply_1.7-0             carData_3.0-5            
## [101] sparseMatrixStats_1.8.0   genefilter_1.78.0        
## [103] lazyeval_0.2.2            promises_1.2.0.1         
## [105] car_3.1-1                 doParallel_1.0.17        
## [107] latticeExtra_0.6-30       R.utils_2.12.2           
## [109] goftest_1.2-3             checkmate_2.1.0          
## [111] spatstat.utils_3.0-1      reticulate_1.27          
## [113] cowplot_1.1.1             webshot_0.5.5            
## [115] statmod_1.5.0             Rtsne_0.16               
## [117] dichromat_2.0-0.1         BSgenome_1.64.0          
## [119] softImpute_1.4-1          uwot_0.1.14              
## [121] igraph_1.3.5              survival_3.3-1           
## [123] yaml_2.3.7                htmltools_0.5.4          
## [125] memoise_2.0.1             VariantAnnotation_1.42.1 
## [127] BiocIO_1.6.0              seriation_1.5.1          
## [129] locfit_1.5-9.7            viridisLite_0.4.1        
## [131] digest_0.6.31             assertthat_0.2.1         
## [133] mime_0.12                 rappdirs_0.3.3           
## [135] SingleR_1.10.0            registry_0.5-1           
## [137] RSQLite_2.2.20            future.apply_1.10.0      
## [139] data.table_1.14.6         blob_1.2.3               
## [141] R.oo_1.25.0               RNeXML_2.4.11            
## [143] preprocessCore_1.58.0     labeling_0.4.2           
## [145] Formula_1.2-4             splines_4.2.0            
## [147] Rhdf5lib_1.18.2           ProtGenerics_1.28.0      
## [149] RCurl_1.98-1.10           broom_1.0.3              
## [151] hms_1.1.2                 base64enc_0.1-3          
## [153] colorspace_2.1-0          DropletUtils_1.16.0      
## [155] BiocManager_1.30.19       ggbeeswarm_0.7.1         
## [157] nnet_7.3-17               sass_0.4.5               
## [159] Rcpp_1.0.10               bookdown_0.32            
## [161] RANN_2.6.1                fansi_1.0.4              
## [163] parallelly_1.34.0         R6_2.5.1                 
## [165] grid_4.2.0                ggridges_0.5.4           
## [167] lifecycle_1.0.3           formatR_1.14             
## [169] writexl_1.4.2             bluster_1.6.0            
## [171] curl_5.0.0                ggsignif_0.6.4           
## [173] leiden_0.4.3              jquerylib_0.1.4          
## [175] howmany_0.3-1             RcppAnnoy_0.0.20         
## [177] iterators_1.0.14          spatstat.explore_3.0-6   
## [179] stringr_1.5.0             htmlwidgets_1.6.1        
## [181] beachmat_2.12.0           polyclip_1.10-4          
## [183] biomaRt_2.52.0            purrr_1.0.1              
## [185] globals_0.16.2            htmlTable_2.4.1          
## [187] patchwork_1.1.2           spatstat.random_3.1-3    
## [189] clusterExperiment_2.16.0  progressr_0.13.0         
## [191] codetools_0.2-18          metapod_1.4.0            
## [193] prettyunits_1.1.1         dbplyr_2.3.0             
## [195] gridBase_0.4-7            R.methodsS3_1.8.2        
## [197] gtable_0.3.1              DBI_1.1.3                
## [199] aroma.light_3.26.0        highr_0.10               
## [201] tensor_1.5                httr_1.4.4               
## [203] KernSmooth_2.23-20        stringi_1.7.12           
## [205] progress_1.2.2            farver_2.1.1             
## [207] reshape2_1.4.4            uuid_1.1-1               
## [209] heatmaply_1.4.2           annotate_1.74.0          
## [211] viridis_0.6.2             magick_2.7.3             
## [213] xml2_1.3.3                BiocNeighbors_1.14.0     
## [215] restfulr_0.0.15           interp_1.1-3             
## [217] ade4_1.7-20               scattermore_0.8          
## [219] scran_1.24.1              bit_4.0.5                
## [221] jpeg_0.1-10               spatstat.data_3.0-0      
## [223] pkgconfig_2.0.3           rstatix_0.7.1            
## [225] knitr_1.42